In this notebook I attempt to build a deep learning model to predict my own heart rate using values of speed, cadence (steps per minute) and altitude. The data is gathered from 35 runs of differing length and intensity I ran between January and March 2021 and measured with a Polar Vantage M sports watch. The measurements are done once a second and the data is stored in 35 different csv files acquired from Polar Flow.
First, let's load the libraries.
import numpy as np
import csv
import os
import tensorflow.keras as K
from tensorflow.keras import models
from tensorflow.keras import layers
from tensorflow.keras import regularizers
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import random
rng = int(random.random() * 10000)
print("Seed was: " + str(rng))
random.seed(rng)
#7364
Next, I define a couple of functions to help with parsing the data from the csv files. The first one gets an input string in the form "hh:mm:ss" and returns the total time in seconds. The second one imputes the missing values of heart rate to be 0. These missing values only really appear at the beginning of some of the runs and thus they can safely be ignored.
def parseTime(timestring):
if ":" in timestring:
stringarr = timestring.split(":")
timeInSeconds = int(stringarr[0])*(60**2) + int(stringarr[1]) * 60 + int(stringarr[2])
return timeInSeconds
else:
return 0
def parseHR(HRstring):
if len(HRstring) > 0:
return(int(HRstring))
else:
return 0
The next cell loops through the csv files and reads the desired data into a 3-D python list with shape (number of runs, number of features, number of measurements). The csv files contain metadata of the run in the first 3 rows, so they are skipped.
runningdata = []
for filename in os.listdir("Running Data\\S"):
with open("Running Data\\S\\" + filename) as file:
reader = csv.reader(file, delimiter = ",")
rownum = 0
rdata = []
for row in reader:
if rownum >= 3:
Time = parseTime(row[1])
HR = parseHR(row[2])
Speed = float(row[3])
Cadence = float(row[5])
Altitude = float(row[6])
Distance = float(row[8])
rdata.append([HR, Speed, Cadence, Altitude, Distance, Time])
rownum += 1
runningdata.append(rdata)
Let's plot the data from all of the runs using matplotlib. As we can see, the runs contain very different profiles of heart rate: some from long steady runs and some from high intensity interval runs. Thus the model will have a non-trivial task in predicting the heart rate accurately. Of course, many other things than speed, altitude and cadence have an effect on the heart rate, suchs as running conditions and the amount of exercise on previous days.
titledict = {
0: "HR",
1: "Speed",
2: "Cadence",
3: "Altitude"
}
for runnum, run in enumerate(runningdata):
fig, axs = plt.subplots(2,2)
fig.suptitle("Run Number " + str(runnum), fontsize = 20)
fig.set_size_inches(20, 10)
plotnum = 0
for i in range(2):
for j in range(2):
axs[i][j].set_title(titledict[plotnum])
axs[i][j].plot([x[plotnum] for x in run])
plotnum += 1
plt.show()
Something strange happened in the end of run #5 (I probably forgot to turn off the recording when I got home) so I will omit the values after 2000 seconds from the data. If there was a lot more data, it probably wouldn't matter that much though. I also impute the zero values of altitude with the previous or if none exists, the next none zero value in the time series.
runningdata[5] = runningdata[5][0:2000]
for run in runningdata:
prevnonzero = None
for i, step in enumerate(run):
imputer = None
if step[3] == 0:
if prevnonzero is not None:
imputer = prevnonzero
else:
for j in range(i+1, len(run)):
if run[j][3] != 0:
imputer = run[j][3]
break
step[3] = imputer
else:
prevnonzero = step[3]
plt.plot([x[3] for x in runningdata[0]])
plt.suptitle("Imputed altitude for run number 0")
plt.gcf().set_size_inches(20,5)
The runs are sampled randomly into training, validation and testing runs. I used 5 runs for validation, 3 runs for testing and the rest for training. I also define some hyperparameters of the models: the lookback, i.e. the amount of values of speed, altitude and distance the model uses as features (I selected 300 here corresponding to the data from the last five minutes of running) and padding, which controls whether to use padding for the missing values of predictor values for the first five minutes of running in each run.
The data splits are printed below.
validate_runs = random.sample(list(range(len(runningdata))), 5)
test_runs = random.sample([x for x in list(range(len(runningdata))) if x not in validate_runs], 3)
train_runs = [x for x in list(range(len(runningdata))) if x not in validate_runs and x not in test_runs]
lookback_amount = 300
padding = True
print("Train runs: " + str(train_runs))
print("Validate runs: " +str(validate_runs))
print("Test runs: " + str(test_runs))
Next, I use nested for loops to create the timeseries for training and testing the model. Each timestep in the series contains data from the last 300 steps. As stated above the variable "padding" controls whether to include the first 300 seconds of each run in the dataset using an extreme value (-1e3) padding for the missing values.
targets_train = []
predictors_train = []
targets_test = []
predictors_test = []
run_indices_train = []
run_indices_test = []
targets_validate = []
predictors_validate = []
run_indices_validate = []
for runnum, run in enumerate(runningdata):
for i, timestep in enumerate(run):
skip = False
speeds = []
cadences = []
altitudes = []
if i >= lookback_amount:
for j in range(i-lookback_amount, i):
speeds += [run[j][1]]
cadences += [run[j][2]]
altitudes += [run[j][3]]
elif padding:
for j in range(lookback_amount-i):
speeds += [-1e3]
cadences += [-1e3]
altitudes += [-1e3]
for j in range(i):
speeds += [run[j][1]]
cadences += [run[j][2]]
altitudes += [run[j][3]]
else:
skip = True
if(not skip):
current_predictors = speeds + cadences + altitudes
if(runnum in train_runs):
targets_train.append(timestep[0])
predictors_train.append(current_predictors)
run_indices_train.append(runnum)
elif(runnum in test_runs):
targets_test.append(timestep[0])
predictors_test.append(current_predictors)
run_indices_test.append(runnum)
else:
targets_validate.append(timestep[0])
predictors_validate.append(current_predictors)
run_indices_validate.append(runnum)
Here I convert the nested lists to numpy.arrays and reshape them to the keras standard of neural network data.
targets_train = np.array(targets_train)
targets_train = targets_train.reshape((-1, 1))
predictors_train = np.array(predictors_train)
targets_test = np.array(targets_test)
targets_test = targets_test.reshape((-1, 1))
predictors_test = np.array(predictors_test)
targets_validate = np.array(targets_validate)
targets_validate = targets_validate.reshape((-1, 1))
predictors_validate = np.array(predictors_validate)
run_indices_test = np.array(run_indices_test)
run_indices_train = np.array(run_indices_test)
run_indices_validate = np.array(run_indices_validate)
print("Training set")
print(targets_train.shape)
print(predictors_train.shape)
print("Validation set")
print(targets_validate.shape)
print(predictors_validate.shape)
print("Testing set")
print(targets_test.shape)
print(predictors_test.shape)
I rescale the values of the predictors to have mean 0 and standard deviation 1 to speed up and stabilize training.
scaler = StandardScaler()
predictors_train = scaler.fit_transform(predictors_train)
predictors_test =scaler.transform(predictors_test)
predictors_validate =scaler.transform(predictors_validate)
predictors_validate = predictors_validate.reshape(predictors_validate.shape[0], -1, 1)
predictors_test = predictors_test.reshape(predictors_test.shape[0], -1, 1)
predictors_train = predictors_train.reshape(predictors_train.shape[0], -1, 1)
The modeling framework is as follows:
The metric which the best model is chosen with is mse which is also used as the loss function.
First, I define the base level of accuracy for the deep learning model by constructing a zero hidden layer linear model. The linear model gives a reasonable baseline of performance for the more complex model to improve upon.
baseModel = models.Sequential()
baseModel.add(layers.Flatten())
baseModel.add(layers.Dense(1))
baseModel.build(predictors_train.shape)
early_stop = K.callbacks.EarlyStopping(
monitor="val_loss",
min_delta=0.1,
patience=15,
verbose=0,
mode="min",
restore_best_weights=True,
)
baseModel.compile(optimizer = K.optimizers.Adam(learning_rate = 0.01), loss = K.losses.MeanSquaredError(), metrics = ["mae"])
The linear model simply minimizes the cost function $\frac{1}{n} \sum_{j=1}^n (y-(\sum_{i=1}^{m}w_ix_i + b))^20)$, where n is the number of samples, m is the number of features and y is the value of the target variable The linear model has 901 parameters corresponding to $300\times 3$ weights of the predictors and 1 bias term. The model is trained using batches of 128 for 150 epochs. The loss function used is mean squared error and the Adaptive Moments Estimation algorithm with learning rate 0.01 is used for optimization. The mean absolute error is also monitored for curiosity's sake. I apply early stopping to for validation mse to find the best model with patience of 15 epochs.
baseModel.summary()
baseFit = baseModel.fit(predictors_train, targets_train, validation_data = (predictors_validate, targets_validate), batch_size = 128, epochs = 150, shuffle = True, callbacks = [early_stop])
The training and validation losses as well as the learning curves of the model are printed below. The model achieves the best validation mse (about 257.8) after 33 epochs. The corresponding mean absolute error values is about 13.24. This sets the baseline for model performance.
baseModel.evaluate(predictors_train, targets_train)
baseModel.evaluate(predictors_validate, targets_validate)
fig, axs = plt.subplots(1,2)
fig.set_size_inches(20,5)
axs[0].plot(baseFit.history["loss"], label = "loss")
axs[0].plot(baseFit.history["val_loss"], label = "val loss")
axs[1].plot(baseFit.history["mae"], label = "mae")
axs[1].plot(baseFit.history["val_mae"], label = "val mae")
axs[0].legend()
axs[1].legend()
Here I define a function for printing predictions of the model against the true heart rate values.
def plotPreds(preds, targets, runs, run_indices, labels = ("Model Prediction",)):
fig, axs = plt.subplots(len(runs))
fig.set_size_inches(15,15)
for run, ax in zip(runs, axs):
ax.plot(targets[run_indices == run, ], label = "Heart Rate")
for i, pred in enumerate(preds):
ax.plot(pred[run_indices == run, ], label = labels[i])
ax.legend()
ax.set_title("Run number " + str(run))
ax.set_ylim((80, 190))
The predictions of the linear model for the validation set look like this.
basePreds = baseModel.predict(predictors_validate)
plotPreds((basePreds,), targets_validate, validate_runs, run_indices_validate)
The linear model seems to fit the data decently well, although there is some clear room for improvement. Here are the MSE and MAE values for the different validation runs.
for run in validate_runs:
print("Run number " + str(run))
baseModel.evaluate(predictors_validate[run_indices_validate == run,], targets_validate[run_indices_validate == run,])
Next, I define the model architecture for a deeper neural network. After trying multiple hyperparameters I decided to go for a fully connected network with four hidden layers consisting of 512, 256, 128 and 32 hidden units respectively. The hidden layers use ReLU as their activation function whereas the output layer uses the identity function. I added L1 and L2 regularization to each of the hidden units as well as dropout of 30 % after the hiden layers to prevent overfitting. I chose adaptive moments estimation algorithm with learning rate 0.0005 as the optimizer. The same principle of early stopping is applied as above.
deepModel = models.Sequential()
deepModel.add(layers.Flatten())
deepModel.add(layers.Dense(512, kernel_regularizer = regularizers.l1_l2(l1 = 1e-1, l2 = 1e-1)))
deepModel.add(layers.ReLU())
deepModel.add(layers.Dropout(0.3))
deepModel.add(layers.Dense(256, kernel_regularizer = regularizers.l1_l2(l1 = 1e-1, l2 = 1e-1)))
deepModel.add(layers.ReLU())
deepModel.add(layers.Dropout(0.3))
deepModel.add(layers.Dense(128, kernel_regularizer = regularizers.l1_l2(l1 = 1e-1, l2 = 1e-1)))
deepModel.add(layers.ReLU())
deepModel.add(layers.Dropout(0.3))
deepModel.add(layers.Dense(32, kernel_regularizer = regularizers.l1_l2(l1 = 1e-1, l2 = 1e-1)))
deepModel.add(layers.ReLU())
deepModel.add(layers.Dense(1))
deepModel.build(predictors_train.shape)
deepModel.compile(
optimizer=K.optimizers.Adam(learning_rate=5e-4), loss = K.losses.MeanSquaredError(), metrics = ["mae"])
The model has 629 697 trainable parameters, divided into layers as listed below.
deepModel.summary()
I train the model using a batch size of 128 with a maximum number of epochs at 500.
deepFit = deepModel.fit(predictors_train, targets_train, validation_data = (predictors_validate, targets_validate), batch_size = 128, epochs = 500, shuffle = True, callbacks = [early_stop])
More or less surprisingly the model achieves a better MSE value for the validation data than the training data. This would imply that the validation data had perhaps "easier" values of heart rate to predict but the same phenomenon could be observed for different train/test splits as well. The model achieved the best validation MSE of about 216.6 after 63 epochs, a solid increase on the base model. The corresponding MAE value is about 10.34. Here are the loss values plotted for the training and testing sets per epoch:
deepModel.evaluate(predictors_train, targets_train)
deepModel.evaluate(predictors_validate, targets_validate)
fig, axs = plt.subplots(1,2)
fig.set_size_inches(20, 5)
axs[0].plot(deepFit.history["loss"], label = "loss")
axs[0].plot(deepFit.history["val_loss"], label = "val loss")
axs[1].plot(deepFit.history["mae"], label = "mae")
axs[1].plot(deepFit.history["val_mae"], label = "val mae")
axs[0].legend()
axs[1].legend()
Here are the predictions for the runs in the validation dataset plotted with the real heart rate values. The plot seem to show that the model is better in predicting the changes in heart-rate
predsDeep = deepModel.predict(predictors_validate)
plotPreds((predsDeep,), targets_validate, validate_runs, run_indices_validate)
Here, the model performance is evaluated seperately for the validation runs. Run number 4 seems to stand out having a lot higher MSE value than the other runs.
for run in validate_runs:
print("Run number " + str(run))
deepModel.evaluate(predictors_validate[run_indices_validate == run,], targets_validate[run_indices_validate == run,])
Finally, I define a convolutional neural network architecture. Using a convolutional model makes sense because the convolutional layers can use features learnt from different parts of the input in other places. An increase in the altitude values (an uphill) should have an increasing effect to the heart rate values regardless of whether it happened 20 or 100 seconds ago. The convolutional network has two convolutional layers followed by one hidden fully connected layer, totaling to about 2.7 million parameters. The complete model description is listed below. The Adam algorithm with learning rate 0.0001 is used with a batch size of 128. The same process is also used for early stopping.
convModel = models.Sequential()
convModel.add(layers.Conv1D(50, 60, 1))
convModel.add(layers.ReLU())
convModel.add(layers.Dropout(0.2))
convModel.add(layers.Conv1D(100, 20, 1))
convModel.add(layers.ReLU())
convModel.add(layers.Dropout(0.2))
convModel.add(layers.Flatten())
convModel.add(layers.Dense(32))
convModel.add(layers.ReLU())
convModel.add(layers.Dropout(0.3))
convModel.add(layers.Dense(1))
convModel.build(predictors_train.shape)
convModel.compile(
optimizer=K.optimizers.Adam(learning_rate=1e-4),
loss = K.losses.MeanSquaredError(), metrics = ["mae"])
convModel.summary()
convFit = convModel.fit(predictors_train, targets_train, validation_data = (predictors_validate, targets_validate), batch_size = 128, epochs = 500, shuffle = True, callbacks = [early_stop])
Again, the validation set gets a lower MSE, clocking in at 121.20 after 57 epochs. The corresponding MAE value is 8.69, a significant increase.
convModel.evaluate(predictors_train, targets_train)
convModel.evaluate(predictors_validate, targets_validate)
fig, axs = plt.subplots(1,2)
fig.set_size_inches(20,5)
axs[0].plot(convFit.history["loss"], label = "loss")
axs[0].plot(convFit.history["val_loss"], label = "val loss")
axs[1].plot(convFit.history["mae"], label = "mae")
axs[1].plot(convFit.history["val_mae"], label = "val mae")
axs[0].legend()
axs[1].legend()
Here are the predictions plotted for the validation runs. All of the runs have an improved MSE value compared to the fully connected network. Run #4 though seems to stand out again with a high MSE value. Based on these observations the convolutional model is chosen as the best model.
predsConv = convModel.predict(predictors_validate)
plotPreds((predsConv,), targets_validate, validate_runs, run_indices_validate)
for run in validate_runs:
print("Run number " + str(run))
convModel.evaluate(predictors_validate[run_indices_validate == run,], targets_validate[run_indices_validate == run,])
Here are the predictions for the different models plotted against the true heart rate values in the validation runs.
plotPreds((basePreds, predsDeep, predsConv), targets_validate, validate_runs, run_indices_validate, ("Base", "Deep FC", "CONV"))
Here, I test the chosen best model, the convolutional model on the testing set runs. The model achieves an MSE of about 143.28, a minor decrease on the validation MSE. The corresponding MAE value is about 9.61.
convModel.evaluate(predictors_test, targets_test)
The prediction plots for the test set are below. Apart from the mid portion of run #18, the predictions seem quite accurate.
preds_test = convModel.predict(predictors_test)
plotPreds((preds_test,), targets_test, test_runs, run_indices_test)
for run in test_runs:
print("Run number " + str(run))
convModel.evaluate(predictors_test[run_indices_test == run,], targets_test[run_indices_test == run,])
To conclude, the convolutional model achieved best performance on the validation set. It predicts the heart rate on the test cases with an average error of a bit under 10 BPM. The amount of error can be deemed succesful taking into account the different things that can affect running heart rate not included in the predictors. Also, as the heart rate is measured with an optical heart rate sensor from the wrist in differing weather conditions, the measurements might have some inherent error in them. Furthermore, the runs took place in the course of two months and thus there could be a significant difference in my aerobic fitness in the first and last runs of the dataset, affecting the heart rate measurements.